Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to provide stats to make_interleaved_dataset and skip keys during norm #62

Merged
merged 6 commits into from
Mar 14, 2024

Conversation

kpertsch
Copy link
Collaborator

Small data loader changes we found helpful during DROID training:

  • add optional argument to pass dataset_statistics to make_interleaved_dataset that are used for normalization (we still compute individual stats for weight & thread balancing) -- useful if you eg want to do co-training and normalize all datasets with the same stats
  • add argument to make_dataset_from_rlds that allows to skip keys during normalization (eg in DROID we don't normalize proprio)
  • adds utility to compute aggregate dataset stats across a list of stats (eg for co-training setting)

@kvablack would be great to merge this ASAP if you have time to take a look!

octo/data/dataset.py Outdated Show resolved Hide resolved
@kpertsch kpertsch merged commit 8559a70 into main Mar 14, 2024
1 check passed
@kpertsch kpertsch deleted the droid_changes branch March 14, 2024 22:05
WenchangGaoT pushed a commit to WenchangGaoT/octo1 that referenced this pull request May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants